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(54) Title: METHOD AND APPARATUS FOR SIMULTANEOUSLY ACCESSING A PLURALITY OF DISPERSED DATABASES 



(57) Abstract 



A method and apparatus for intelligent 
Internet searching (16), the apparatus residing 
as a software application on a user computer 
(the client) (12). A single subject database 
is stored on the client and accessed by the 
application. The majority of the single subject 
database content comprises a hierarchical listing 
of "hidden" web databases, all entries being 
organized by subject matter and each including 
a description of database content and search 
term entry interface customized for the particular 
database access page format. There are also 
entries in the database which provide an interface 
to search engines hosted at a dedicated search 
server. Each database is preferably updated at 
a regular interval, such as monthly or weekly, 
via remote download from a server on the WAN, 
or by other data transport means. A plurality 
of simultaneous hidden database searches may 
be performed by the application by linking the 
client to the appropriate database access pages 
on the network and forwarding the user desired 
search information. Search results are cached 
on the user computer for comparison to newly 
found search results, allowing for easy sorting of 
new and old data and differentiated display to the 
user. Desired keywords are preferably cached and 
shared among database search interfaces. 
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ACCESSING A PLURALITY OF DISPERSED DATABASES 
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Field of the Invention 

The present invention is related generally to the field of database searching, and more 
specifically to simultaneous searching for data across a wide area network such as the Internet, 
the network including a plurality of clients and servers and a plurality of databases. 

Backgroun d of the Invention 

A wide area computer network, or WAN, comprises a geographically disperse, 
interconnected plurality of computers capable of sharing data and/or processing capacity. The 
Internet is the world's largest WAN, growing at an annual rate some estimate to be above one 
thousand percent. In March of 1998, there were an estimated 320 million pages of 
information posted on the World Wide Web (the graphics-capable portion of the Internet), 
with uncounted millions of gigabytes of additional information stored in non-Web based, 
though Web accessible, databases. For the purpose of describing the present invention, 
information obtained through the Web, for example presented in Hyper Text Markup 
Language (HTML) and available at a consistent Uniform Resource Locator (URL) is within 
the "visible" web and is termed "directly accessible." Conversely, information accessible only 
via access to a distinct portal or other electronic doorway (even if such a portal or doorway is 
found on the Web) is within the hidden or "invisible" web and is termed "indirectly 
accessible." While there are numerous search engines and "web crawlers" that may be used to 
search for directly available data on the visible web, there is presently no singular source for 
accessing the indirectly available information on the hidden web. The present invention 
addresses the need for an efficient method of finding data on a large scale WAN such as the 
Internet, including the visible and hidden portions of the World Wide Web, and the need to 
efficiently update found information as content evolves and grows. 
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Summary of the Invention 

To address the shortcomings of the available art, the present invention provides an 
intelligent WAN searching apparatus which resides on a user's computer. A single subject 
database (e.g., a healthcare database), or a plurality of single subject databases, are stored on 
the client and accessed by the application. The majority of the single subject database entries 
comprises a hierarchical listing of hidden web databases, all entries being organized by subject 
matter and each including a description of database content and a search term entry interface 
customized for the particular database access page format. A user may establish a single query 
that the application then broadcasts to each desired hidden database to obtain indirectly 
accessible information. The results of the query are cached on the user's computer and 
displayed, preferably in HTML format. There are also listings in the database which provide 
an interface to search engines hosted at a dedicated search server. Each of these search 
engines includes a subject matter-limited listing of visible web sites that are particularly 
relevant to the database=s subject. Thus, the user's query can be broadcast through the 
dedicated search server to obtain directly accessible information from the visible web. The 
search results of the visible web sites can then be displayed in HTML format similar to the 
results of hidden web searches. Each database is preferably updated at a regular interval, such 
as monthly or weekly, via remote download from a server on the WAN, or by other data 
transport means. 

A plurality of simultaneous hidden database searches may be performed by the client 
application to the extent connection bandwidth is available for linking the client to the 
appropriate database access pages and forwarding the user=s desired search information. 
Preferably, search results from both hidden and visible web searches are cached on the user=s 
computer for comparison to newly found search results, allowing for easy sorting of new and 
old data and differentiated display to the user. Desired keywords are preferably cached and 
shared among database search interfaces. 
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Detailed Description of a Prefe rred Embodiment 

Referring to FIG. 1, the present invention is preferably implemented as a software 
application 10 executed at least primarily by a client computer 12 connected to a wide area 
network 16 (such as the Internet) including a plurality of client computers and server 
computers 14. Application 10 stores and accesses at least one single subject database. The 
majority of the single subject database entries comprise hierarchical listings of hidden web 
databases or sources, all entries being organized by subject matter and each including a 
description of database content, URL information to locate the database and a search protocol 
for the database, such as a term entry interface customized for the particular database access 
page format. Application 10 obtains indirectly accessible information by issuing queries to the 
listed hidden web databases. The single subject database entries also comprise listings for 
search engines hosted at a dedicated search server 17. By routing queries through the 
dedicated search server, application 1 0 obtains directly accessible information from the visible 
web. Application 10 also provides a timing interface 18, illustrated in FIG. 2, for the user to 
set times (such as by the hour or the day of the week) for the client to monitor the results of a 
specific hidden web database or visible web query (preferably executed through the search 
engine provided by client 12 for the desired hidden database or databases, or the user's desired 
visible web search terms). 

Client 1 2 preferably stores the user's preferred monitoring schedule on a hard drive or 
similar stable memory local to the client and checks the schedule every time client application 
10 is activated, as well as at predetermined intervals (e.g., every 15 minutes) thereafter while 
application 10 is activated. If a schedule check reveals query results are due to be monitored, 
client 12 obtains indirectly accessible information by sending the user's desired query to the 
desired sites from the database and directly accessible information by sending the query to the 
search engine server dedicated to a specific group of visible web sites, and retrieves the 
results. Client 12 is then preferably directed by application 10 to compare new results to 
previously retrieved results using a difference algorithm, and to display the difference in 
HTML format on a current results viewing page. In the case of a visible web query, a server 
17 dedicated to visible web search functions (hosted by a service provider such as Citizen 1, 
Inc., the assignee of the present invention) is preferably directed by client 12 to do a previous 
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Extraneous information such as advertising banners are preferably removed to allow the user 
to focus on new results. An example of a result comparison HTML display is provided in 
FIG. 5. If the query results have not changed, client 12 notifies the user. If the query results 
have changed, client 12 notifies the user and creates an HTML document which displays the 
differences between the old and new query results and highlights the differences in the body of 
the text on the most recent HTML results page. The provided results page preferably also 
provides link elements within the text to navigate between each of the differences and links to 
view previous and current results. Client 12 then provides the user a mechanism to view 
results within a browser, and replaces a previously cached HTML results document (and 
related graphics) with a current results document. The client application finally caches the 
most recent query results, and provides means for the user to view the most recent results. 
Client 12 will preferably only compare a newest scheduled search result to a first search or 
subsequent, most recently changed result. 

The preferred process for query comparison and difference display for the visible web 
is largely similar to the above-described process for hidden databases, save for the difference 
comparison. If a particular query has not already been executed, client 12 formats and sends 
the query to server 17 at a next predetermined time interval. Server 17 then sends an HTML 
result page and results summary document back to client, in response to which client provides 
to user the usual means for viewing these results in a Web browser, and client caches both an 
HTML results page and a summary document. At each subsequent monitoring time, client 12 
formats and sends both query and a previous result summaiy document to server 17, which 
uses the previous summary document and current query summaty document to compare 
current queiy results to previous results, and sends an HTML-formatted changed results page 
back to client 12 (thus, the page displays only new or different results, not unchanged results). 
Client 12 then provides the customary means for the user to view results in a Web browser, 
and client caches the newest HTML results page and newest summary document for later 
comparison. Server 17 may also be configured to maintain a user's query and search 
preferences and run the monitoring functions automatically. Server 17 can then notify user of 
any changed results by communicating directly with client 12 during the next execution of the 
application, by email, or by network independent methods such as paging or automated phone 
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embodiments may be provided. Specifically, the invention has been described with a view 
towards implementations using the internet as the WAN. As such the preferred means of 
storing and displaying information is in HTML form. However, the invention is suitable for 
other WAN applications and the specific implementations may be tailored appropriately. 
Further, a variety of data comparison algorithms may be utilized to increase system throughput 
and are clearly within the scope and spirit of this description. Such other embodiments are 
intended to fall within the scope of the present invention. Consequently, the above description 
is intended to be exemplary only. 
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5. A computer- readable medium containing instructions for controlling a 
computer to automate searching of a wide area network, by: 

a) maintaining a database related to a desired subject, wherein the 
database comprises a plurality of entries, each entry locating a source for indirectly and 
directly accessible information and a search protocol for the source; 

b) issuing a query to desired sources; 

c) retrieving the results of the query; and 

d) displaying the results. 

6. The computer-readable medium of claim 1, further comprising: 

e) issuing the query to the desired sources after a given time interval; 

f) retrieving the results of the query issued in step e); 

g) comparing the results of the query issued in step e) to the results of the 
query in step c); and 

h) displaying any changes determined in step g). 
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5. A computer-readable medium containing instructions for controlling a 
mputer to automate searching of a wide area network, by: 

a) maintaining a database related to a desired subject, wherein the 
database comprises a plurality of entries, each entry locating a source for indirectly and 
directly accessible information and a search protocol for the source; 

b) issuing a query to desired sources; 

c) retrieving the results of the query; and 

d) displaying the results. 

6. The computer-readable medium of claim 1, further comprising: 

e) issuing the query to the desired sources after a given time interval; 

f) retrieving the results of the query issued in step e); 

g) comparing the results of the query issued in step e) to the results of the 
query in step c); and 

h) displaying any changes determined in step g). 
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(54) Title: METHOD AND APPARATUS FOR SIMULTANEOUSLY ACCESSING A PLURALITY OF DISPERSED DATABASES 

(57) Abstract 

A method and apparatus for intelligent 
Internet searching (16), the apparatus residing 
as a software application on a user computer 
(the client) (12). A single subject database 
is stored on the client and accessed by the 
application. The majority of the single subject 
database content comprises a hierarchical listing 
of "hidden* 1 web databases, all entries being 
organized by subject matter and each including 
a description of database content and search 
term entry interface customized for the particular 
database access page format. There are also 
entries in the database which provide an interface 
to search engines hosted at a dedicated search 
server. Each database is preferably updated at 
a regular interval, such as monthly or weekly, 
via remote download from a server on the WAN. 
or by other data transport means. A plurality 
of simultaneous hidden database searches may 
be performed by the application by linking the 
client to the appropriate database access pages 
on the network and forwarding the user desired 
search information. Search results are cached 
on the user computer for comparison to newly 
found search results, allowing for easy sorting of 
new and old data and differentiated display to the 
user. Desired keywords are preferably cached and 
shared among database search interfaces. 
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METHOD AND APPARATUS FOR SIMULTANEOUSLY 
ACCESSING A PLURALITY OF DISPERSED DATABASES 

i Inventors: 

Timothy G. Bratsos 
Peter J. R. Bonney 
Lynn W. Barr 

Field of the Invention 

The present invention is related generally to the field of database searching, and more 
specifically to simultaneous searching for data across a wide area network such as the Internet, 
the network including a plurality of clients and servers and a plurality of databases. 

Background of the Invention 

A wide area computer network, or WAN, comprises a geographically disperse, 
interconnected plurality of computers capable of sharing data and/or processing capacity. The 
Internet is the world's largest WAN, growing at an annual rate some estimate to be above one 
thousand percent. In March of 1998, there were an estimated 320 million pages of 
information posted on the World Wide Web (the graphics-capable portion of the Internet), 
with uncounted millions of gigabytes of additional information stored in non-Web based, 
though Web accessible, databases. For the purpose of describing the present invention, 
information obtained through the Web, for example presented in Hyper Text Markup 
Language (HTML) and available at a consistent Uniform Resource Locator (URL) is within 
the "visible" web and is termed "directly accessible." Conversely, information accessible only 
via access to a distinct portal or other electronic doorway (even if such a portal or doorway is 
found on the Web) is within the hidden or "invisible" web and is termed "indirectly 
accessible." While there are numerous search engines and "web crawlers" that may be used to 
search for directly available data on the visible web, there is presently no singular source for 
accessing the indirectly available information on the hidden web. The present invention 
addresses the need for an efficient method of finding data on a large scale WAN such as the 
Internet, including the visible and hidden portions of the World Wide Web, and the need to 
efficiently update found information as content evolves and grows. 
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Summary of the Invention 

To address the shortcomings of the available art, the present invention provides an / 
intelligent WAN searching apparatus which resides on a user's computer. A single subject 
database (e.g., a healthcare database), or a plurality of single subject databases, are stored on 
the client and accessed by the application. The majority of the single subject database entries 
comprises a hierarchical listing of hidden web databases, all entries being organized by subject 
matter and each including a description of database content and a search term entry interface 
customized for the particular database access page format. A user may establish a single query 
that the application then broadcasts to each desired hidden database to obtain indirectly 
accessible information. The results of the query are cached on the user's computer and 
displayed, preferably in HTML format. There are also listings in the database which provide 
an interface to search engines hosted at a dedicated search server. Each of these search 
engines includes a subject matter-limited listing of visible web sites that are particularly 
relevant to the databases subject. Thus, the user's query can be broadcast through the 
dedicated search server to obtain directly accessible information from the visible web. The 
search results of the visible web sites can then be displayed in HTML format similar to the 
results of hidden web searches. Each database is preferably updated at a regular interval, such 
as monthly or weekly, via remote download from a server on the WAN, or by other data 
transport means. 

A plurality of simultaneous hidden database searches may be performed by the client 
application to the extent connection bandwidth is available for linking the client to the 
appropriate database access pages and forwarding the user=s desired search information. 
Preferably, search results from both hidden and visible web searches are cached on the user=s 
computer for comparison to newly found search results, allowing for easy sorting of new and 
old data and differentiated display to the user. Desired keywords are preferably cached and 
shared among database search interfaces. 
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Detailed Description of a Preferred Embodiment 

Referring to FIG. 1, the present invention is preferably implemented as a software 
application 10 executed at least primarily by a client computer 12 connected to a wide area 
network 16 (such as the Internet) including a plurality of client computers and server 
computers 14. Application 10 stores and accesses at least one single subject database. The 
majority of the single subject database entries comprise hierarchical listings of hidden web 
databases or sources, all entries being organized by subject matter and each including a 
description of database content, URL information to locate the database and a search protocol 
for the database, such as a term entry interface customized for the particular database access 
page format. Application 10 obtains indirectly accessible information by issuing queries to the 
listed hidden web databases. The single subject database entries also comprise listings for 
search engines hosted at a dedicated search server 17. By routing queries through the 
dedicated search server, application 10 obtains directly accessible information from the visible 
web. Application 10 also provides a timing interface 18, illustrated in FIG. 2, for the user to 
set times (such as by the hour or the day of the week) for the client to monitor the results of a 
specific hidden web database or visible web query (preferably executed through the search 
engine provided by client 12 for the desired hidden database or databases, or the user's desired 
visible web search terms). 

Client 12 preferably stores the user's preferred monitoring schedule on a hard drive or 
similar stable memory local to the client and checks the schedule every time client application 
10 is activated, as well as at predetermined intervals (e.g., every 15 minutes) thereafter while 
application 1 0 is activated. If a schedule check reveals query results are due to be monitored, 
client 12 obtains indirectly accessible information by sending the user's desired query to the 
desired sites from the database and directly accessible information by sending the query to the 
search engine server dedicated to a specific group of visible web sites, and retrieves the 
results. Client 12 is then preferably directed by application 10 to compare new results to 
previously retrieved results using a difference algorithm, and to display the difference in 
HTML format on a current results viewing page. In the case of a visible web query, a server 
17 dedicated to visible web search functions (hosted by a service provider such as Citizen 1, 
Inc., the assignee of the present invention) is preferably directed by client 12 to do a previous 
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Extraneous information such as advertising banners are preferably removed to allow the user 
to focus on new results. An example of a result comparison HTML display is provided in * 
FIG. 5. If the query results have not changed, client 12 notifies the user. If the query results 
have changed, client 12 notifies the user and creates an HTML document which displays the 
differences between the old and new query results and highlights the differences in the body of 
the text on the most recent HTML results page. The provided results page preferably also 
provides link elements within the text to navigate between each of the differences and links to 
view previous and current results. Client 12 then provides the user a mechanism to view 
results within a browser, and replaces a previously cached HTML results document (and 
related graphics) with a current results document. The client application finally caches the 
most recent query results, and provides means for the user to view the most recent results. 
Client 12 will preferably only compare a newest scheduled search result to a first search or 
subsequent, most recently changed result. 

The preferred process for query comparison and difference display for the visible web 
is largely similar to the above-described process for hidden databases, save for the difference 
comparison. If a particular query has not already been executed, client 12 formats and sends 
the query to server 17 at a next predetermined time interval. Server 17 then sends an HTML 
result page and results summary document back to client, in response to which client provides 
to user the usual means for viewing these results in a Web browser, and client caches both an 
HTML results page and a summary document. At each subsequent monitoring time, client 12 
formats and sends both query and a previous result summary document to server 17, which 
uses the previous summary document and current query summary document to compare 
current query results to previous results, and sends an HTML-formatted changed results page 
back to client 12 (thus, the page displays only new or different results, not unchanged results). 
Client 12 then provides the customary means for the user to view results in a Web browser, 
and client caches the newest HTML results page and newest summary document for later 
comparison. Server 17 may also be configured to maintain a user's query and search 
preferences and run the monitoring functions automatically. Server 17 can then notify user of 
any changed results by communicating directly with client 12 during the next execution of the 
application, by email, or by network independent methods such as paging or automated phone 
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embodiments may be provided. Specifically, the invention has been clescribed with a view 
towards implementations using the internet as the WAN. As such the preferred means of 
storing and displaying information is in HTML. form. However, the invention is suitable for 
other WAN applications and the specific implementations may be tailored appropriately. 
Further, a variety of data comparison algorithms may be utilized to increase system throughput 
and are clearly within the scope and spirit of this description. Such other embodiments are 
intended to fall within the scope of the present invention. Consequently, the above description 
is intended to be exemplary only. 
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Main dialog, Help submenu 


C314 


Help submenu of Main application dialog. 


Main dialog, Category windows 


C320 


Category windows of Main application dialog. 


Main dialog, 

Category/Bookmarks windows 


C322 


Category /Bookmarks windows of the Main application dialog. 


Main dialog, Bookmark Actions 
submenu 


C324 


Bookmark Actions submenu of the Main application dialog. 


Main dialog, Site Description 
window 


C330 


Site Description window of Main application dialog. 


Main dialog, Site 
Description/Bookmarks window 


C332 


Bookmark Provider windows of the Main application dialog. 


Main dialog, Search Results 
window 


C340 


Search Results window of Main application dialog. 


Search Results window submenu 


C345 


Search Results window submenu. 


E-mail Search Results dialog 


C347 


E-mail Search Results dialog. 


Site Monitor Schedule dialog 


C370 


Used to schedule Site Monitors. 


Site Monitor, Custom Schedule 
dialog 


C373 


Used to schedule Site Monitors for specific dates/times. 


Site Monitor, E-mail Notification 
Options dialog 


C375 


Used to set e-mail notification options for a monitored site. 


Export Bookmarks dialog 


C380 


Used to Export a bookmark Folder or a single resource. 


Import Bookmarks dialog 


C390 


Used to Import bookmarks. 


Setup dialog 


C400 


Main setup dialog. 


Configure Proxies dialog 


C425 


Reached from Setup dialog, configures proxies. 


Configure Cache dialog 


C450 


Reached from Setup dialog, configures cache for multi-user 
operation. 


Registration dialog 


C500 


Reached from Setup dialog and at start up (if registration id has 
expired), allows entry of a new registration id. 


Find dialog 


C600 


Locate resources based on keywords. 


About dialog 


C700 


About dialog. 


How To Use CiteLine dialog 


C800 


Dialog with quick help on how to use CiteLine. 


How To Contact Citizen I dialog 


C850 


Dialog listing phone numbers and addresses for Citizen 1 offices. 


Toolbar dialog 


C900 


Used for navigating and viewing search results. 
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Site Monitor 



Schedule- 



Select when you would like the search to be performed. 

Remember CiteLine MUST be running for scheduled searches to run. 

O Don't schedule search 
O Every time CiteLine starts up 
O Every 



OK 
Cancel 



© Every 



hours 



0 Sunday at | 07 j 00 



□ Monday at ^2^3 fc00/fc 



□ Tuesday at ^12^j ^00^ 



0 Wednesday at | 12 |j | 15 



□ Thursday at r/12<M fcOO^ 
0 Friday 
[✓1 Sunday 



at 
at 



10 Q 



00 
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AM 



PM 



^PM: 



PM 



PM 



O Every | ■-: day of the month 
O On [c ■ \~\ J {££ \w\ f 



■at 



E 



Monitoring Options — 

© Only show search results if they are different from last time 
O Always show search results in search results window 
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Back 



File Edit View Go Favorite Help 



o° »o O 0 ^ Q> [?b a A* „ 

Back forward Stop Refresh Home Search Favorite Print Font Mail Edit 



Address C:\ProgramFiles\CiteLine Provi 



Links [d] Best ol Ihe Web [d] Today Links \3\ Web Gallery R) Prod 



INFOMINE Query Results 



Biological, Agricultural & Medical Sciences 



Query: Crohn's 

Number of Resources Found: J 
Crohn's Disease Web Page 



Click for lerm.s leading to related resources) 



"This site provides general information about Crohn's Disease (CD) 
with pointers to helpful books and resources and a concise descriptive 
list of Crohn's and medical related links for those who want to learn 
more. It is essentially a guide for patients, family, friends, and others to 
the most important CD resources." 
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